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Introduction 

The recent authorization of a sequencing platform for cUnical use 
by the Food and Drug Administration will expand and accelerate 
the use of genetic information in medical care^ Progress is particu- 
larly impressive in the deployment of sequencing tools for neonatal 
diagnostics-. Commoditization of genome-wide genotyping and 
sequencing is happening as rapidly outside of the medical setting 
- prominently through companies offering "direct to consumer" 
(DTC) services. There is full awareness of the need to protect these 
data^ - while simultaneously supporting their use in research\ 
Here, we discuss how protection of genome data from medical and 
non-medical sources needs to be reframed considering the mutual 
implications of personal decision, online social networks and con- 
sequences to relatives. 

On personal decisions 

Paradoxically, genomics is an attractive field for individual or col- 
lective altruism - many people are willing to place their genome 
data in the public domain, and to actively engage in genomic 
research. The academic community is also calling for definitive 
actions to support global data- sharing \ Many research partici- 
pants count on the protection of their identity. However, current 
strategies have proven insufficient to stop sophisticated attacks 
on genetic data. A recent study^ demonstrated the feasibility of 
re-identifying DNA donors from a public research database by using 
information available from popular genealogy websites. Attackers 
can also take advantage of gaps in the protection of other sources 
of data, for example census and voter lists, hospital insurance 
reports, and increasingly, from online social networks (see below). 
Genome data in the wrong hands could have undesirable conse- 
quences: from discrimination, or release of paternity, ancestry or 
other data that the participant did not intend to be public, to more 
prosaic usages such as targeted advertisements based on genome 
information. 

Genome and online social networks 

Online social platforms are convenient sites for posting data but 
they are susceptible to "multilayer attacks": the possibility to simul- 
taneously aggregate data from online social networks (e.g.. Face- 
book), health related websites (e.g., patientslikeme.com), platforms 
for sharing genome data (e.g., OpenSNP.org), family history resources 
(e.g., ancestry.com), research datasets (e.g., 1000 Genomes Pro- 
ject), and public records (e.g., voter registration forms) can help an 
attacker de-anonymize the owner of an anonymized genome and/ 
or infer the genomic data of his/her family members. We illustrate 
in Figure lA the feasibility and ease of cross-identification of a 
given individual across various genetic and non-genetic platforms, 
including the reconstitution of parts of the family pedigree. 

On kinship issues 

Kin aspects of genomics were well publicized by the recent contro- 
versy regarding the public release of the genome of Henrietta Lacks 



(August 1, 1920 - October 4, 1951). HeLa, a cell line estabhshed 
from Lacks, has been used for decades in research laboratories 
world-wide. Recently, HeLa cells were sequenced and the genome 
data posted online without the consent of her relatives, who subse- 
quently complained that this accounted to revealing private informa- 
tion about the family. The multilayer attacks mentioned above can 
reconstruct phylogenies from revealed genomes and open the door 
to genetic prediction of family members. The amount of kin pri- 
vacy lost from such attacks can be precisely estimated (Figure IB). 
As more individuals will have their genome sequenced or geno- 
typed in coming years, the loss of privacy of family members 
through multilayer attacks will increase if no action is taken. 

Solutions from computer science 

There is little doubt that genome privacy will be challenged - in 
particular if the medical establishment relies solely on legal deter- 
rents and conventional protection of stored data, or if it resorts to inef- 
fective deidentification and anonymization of genome data shared 
for the purpose of research. However, personal genetic tests and 
genomic research are possible without jeopardizing the genomic 
privacy of the individual or of family members. In particular, IT 
security provides a trove of solutions. These include using effi- 
cient cryptographic techniques for privacy-preserving personal- 
ized medicine'''^', and for genomic research^. With such approaches, 
genomic data are always stored in encrypted form and medical per- 
sonnel or researchers can access only the subset of genomic infor- 
mation required for healthcare or dedicated studies. Similarly there 
are obfuscation-based solutions^ to use genomic data in research 
settings in a privacy-preserving way. 

Some genome researchers may be tempted to belittle the threat 
raised by the possible leakage of genomic data. This is a mistake, 
because progress in genetics is likely to make these data more and 
more meaningful. In addition, if it appears that genomic data are 
not properly protected, people could start distrusting genetics, with 
negative consequences for the progress of medicine. Protection 
needs to consider both the interest of the individual and of relatives. 
It is important to learn from errors in Internet security over the last 
decades. In that field, tools and solutions are often lagging behind 
threats. 

The first meeting exclusively dedicated to genomic privacy took 
place in October 2013 at the Leibniz Center for Informatics in 
Dagstuhl, Germany (http://www.dagstuhl.de/13412). As one of 
the outcomes, the community set up a web site reporting the efforts 
and progress on this topic: https://genomeprivacy.org/. Notably, this 
site contains the list of research groups active in this field, as well 
as basic information to facilitate the understanding of this novel 
field. It is our conviction that by pooling together the skills of 
geneticists, law scholars, ethicists and computer scientists, we are 
still in time to strike an appropriate balance between accessibility to 
genome data and their protection. 
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Figure 1. Attacks on genomic privacy. (A) Multilayer attacks using data from genomic and non-genomic platforms. An attacker can obtain 
the anonymized genomic data of an individual from one of the genome data websites (e.g., openSNP.org). Then, the attacker can de- 
anonymize the owner of the genome (i.e., learn his/her identity) by matching his/her phenotypic, demographic and administrative information 
(e.g., profile picture, age, gender, ZIP code) across the individual's online social network profile. Once the individual is de-identified, the 
attacker can also determine his/her family members from a family history resource (e.g., ancestry.com) and infer the genomic data of family 
members from the individual's retrieved genome. For example, owners of some genomes uploaded to openSNP can be de-anonymised using 
their Facebook profiles. For 6 individuals who publicly revealed the names of some of their relatives on Facebook, 29 familial relationships 
could be identified^. (B) Decrease in genomic privacy of the target person (circled in red) when the genomes of his family members are 
gradually revealed. The health privacy of family members can be quantified. For example, two single nucleotide polymorphisms (rs7412 
and rs429358) of the Apolipoprotein E {ApoE) gene are associated with increased risk for Alzheimer's disease. The identification in several 
members of the pedigree of a carrier status for those risk alleles can reveal the ApoE4 status of the target person to the attacker. 
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This article raises a very important issue: the difficulty of providing privacy for genetic information in the 
light of inheritance. I cannot stress enough how important this aspect is, since it requires data protection 
measures, such as the mentioned cryptographic or obfuscation-based approaches, to be utterly 
restrictive. The article gives clear examples where -- intentionally or unintentionally -- leaked information 
allowed harmful inferences. The authors combine information from public genomes and social networks 
to infer information about people who have not released any information about their genes. These 
examples should be taken very, very seriously, since they are raised only at the very beginning of the 
scientific development. As the paper argues we can expect the use of genomics to significantly grow. 
Genetic information can prove at least as harmful as location information provided by cell phones, but it is 
impossible (for now) to change it. It is therefore a scientific and societal challenge to protect genomic 
information much better than we protect our information in current telecommunication networks. The 
article references excellent works from the computer security community that can certainly provide a 
guiding direction, but even these mechanisms need to necessarily leak some information about the 
genomes. I hope that medical and computer science researchers will take the challenge described by the 
article seriously and look for mechanisms that control the entire information in all medical or non-medical 
information system based on direct or indirect genomic data. 

I have read this submission. I believe that I have an appropriate level of expertise to confirm that 
it is of an acceptable scientific standard. 

Competing Interests: No competing interests were disclosed. 
Xiaoqian Jiang 
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This is a timely commentary on privacy, kin, and genomics. Today, many gene donators are still ignorant 
of the potential impact of information leakage to the family when their genome data are made public. This 
problem is becoming more critical as the younger generation reveals themselves and family members on 
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online social networks and open family history resources made it possible to link individuals. I found this 
topic to be extremely important. 



I have read this submission. I believe that I have an appropriate level of expertise to confirm that 
it is of an acceptable scientific standard. 

Competing Interests: No competing interests were disclosed. 
XiaoFeng Wang 

Centre for Security Informatics, School of Informatics and Computing, University of Indiana at 
Bloomington, Bloomington, IN, USA 
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This paper discusses the challenge of protecting human genome data, particularly its unique feature in 
that one's DNA data can be used to infer the private health information of those genetically related to 
them. The authors talk about the conflict between the perception that the decision on releasing one's DNA 
materials is personal, and how it can actually impact on the privacy of their kin. They further sketch a 
technique for quantifying such an information leak, and demonstrate that the threat is realistic, given the 
de-anonymization attack that can happen through the booming online social networks. I feel that this 
article provides useful information for raising the awareness of the uniqueness and significance of 
genome privacy. This, hopefully, will lead to a broad, in-depth conversation among genomics 
researchers, security and privacy researchers, bioethics experts, genomics industry, policy makers and 
the public on how to effectively regulate the dissemination of human DNA data to facilitate scientific 
research, without undermining DNA donors' privacy and well-being. 

I have read this submission. I believe that I have an appropriate level of expertise to confirm that 
it is of an acceptable scientific standard. 

Competing Interests: No competing interests were disclosed. 
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